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ABSTRACT 

We present a new method for detecting and correcting systematic errors in the dis- 
tances to stars when both proper motions and Hne-of-sight velocities are available. The 
method, which is applicable for samples of 200 or more stars that have a significant 
extension on the sky, exploits correlations between the measured U, V and W velocity 
components that are introduced by distance errors. We deliver a formalism to describe 
and interpret the specific imprints of distance errors including spurious velocity cor- 
relations and shifts of mean motion in a sample. We take into account correlations 
introduced by measurement errors. Galactic rotation and changes in the orientation 
of the velocity ellipsoid with position in the Galaxy. Tests on pseudodata show that 
the method is more robust and sensitive than traditional approaches to this problem. 
We investigate approaches to characterising the probability distribution of distance 
errors, in addition to the mean distance error, which is the main theme of the paper. 
Stars with the most overestimated distances bias our estimate of the overall distance 
scale, leading to the corrected distances being slightly too small. We give a formula 
that can be used to correct for this effect. We apply the method to samples of stars 
from the SEGUE survey, exploring optimal gravity cuts, sample contamination, and 
correcting the used distance relations. 
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1 INTRODUCTION 

Studies of stellar kinematics in the Milky Way are of enor- 
mous importance as they hold the key both to measuring 
the gravitational field of the Galaxy and to unravelling the 
Galaxy's history and manner of formation. Gonsequently 
considerable resources have been, and are being, devoted 
to measuring the velocities of stars. 

Two different techniques have to be used to measure the 
three components of velocity with respect to the Sun: the 
component v\\ along the line of sight to the star is measured 
spectroscopically, while the component vx transverse to the 
line of sight is determined by combining the measured proper 
motion ^ with an estimate of the distance s to the star. 
Over the next decade enormous numbers of distances will 
be obtained from parallaxes measured by the Gaia satellite, 
but currently the great majority of distance estimates have 
been obtained by comparing an estimate of the star's abso- 
lute magnitude with its apparent magnitude. This process 
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is liable to systematic error in several ways. Giants can be 
mistaken for subgiants or even dwarfs of the same colour (or 
vice versa) and assumed ages severely infiuence the adopted 
luminosities in the turn-off region even for well classified 
stars. Also , th e adopted mctallicities may be biased as dis- 
cussed by iLee e t al. (2008a) and as demonstrated by th e 
shifts in metalli c ity sc ale between Nordstrom et ahl l|2004l l. 
iHolmberg et al.l (|2007f ) and lCasagrande et all (|201ll ). An er- 
roneous metallicity will lead to the wrong isochrone being 
used to infer the luminosity, and an erroneous luminosity 
and distance will follow. Further problem s are that synthetic 
colou rs can be wrong (cf. the discussion in lPercival &: Salarid 
and that stellar-evolution models can predict differ- 
ent luminosities for given metallicity and effective tempera- 
tures; there is evidence that they make the main sequences 
of metal-po or objects too faint (the "helium problem", e.g. 
discussed in iGasagrande et al.ll2007h . Finally, erroneous ex- 
tinctions may be adopted. Since the problems just enumer- 
ated can readily accumulate to systematic distance biases in 
excess of 20 per cent, some way of independently calibrating 
the distance scale is invaluable. 
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Here we present a method for calibrating distances that 
exploits correlations between the measured U, V, W compo- 
nents of velocity that are introduced by systematic distance 
errors, and is applicable to any survey that provides proper 
motions and line-of-sight velocities over a wide area of the 
sky. 

The idea that the typical distance to objects in a sample 
can be constrained by proper motions i s well known in as- 
tronom y - for a useful recent review see lPopowski fc Gouldl 
l|l998h . The method of secular parallaxes determines the 
mean parallax of a population by combining proper mo- 
tions and the known mean motion of the population 
with respect to the Sun (e.g. iTrumpler fc Weaveij 1 19621 : 
iBinnev fc Merrifield|[T99i . §2.2.3), while the method of sta- 
tistical paralla:xes estimates the mean parallax by com- 
bining proper motions with line-of-sight velocities (e.g. 
iBinnev fc Merrifield|[l99i . §2.2.4). Our method has points in 
common with both the above methods in that it hinges on 
comparing proper motions with line-of-sight velocities but 
also exploits the mean motion of the stars with respect to 
the Sun. It is much less vulnerable than classical methods to 
questionable assumptions regarding the shape of the velocity 
ellipsoid and/or the nature of mean velocit y field (see esp. 
the discussion in ITrumpler fc Weaver! [l96l l. By examining 
the way correlations between components of space velocity 
vary with position on the sky, we dispense with the need for 
prior knowledge of the mean velocity field. All we require 
is knowledge of the formal errors of the observables and, if 
the sample is sufficiently non-local, reasonable assumptions 
about the orientation of the velocity ellipsoid at relevant 
points in the Galaxy. 

Section[2]lays out the basic theory for the case in which 
distances are all in error by a common factor. Section [3] 
extends the theory to the realistic case in which distance 
errors contain a random component. Section |4] applies the 
method to data from the Sloan surveys. Section [5] sums up. 



2 THE MEAN DISTANCE ERROR 

We are concerned with the case in which calibration errors 
in the distance scale cause all distances have a fractional 
error /, so the assumed distance s' to a star is related to the 
true distance s by 



s'^{l + f)s. 



(1) 



Consequently the assumed tangential velocity is related 
to the true tangential velocity vj_ by 



vY = (l + /)vx. 



(2) 



The velocity component along the line of sight is of course 
unaffected by distance errors. 

From and the proper motions (/if, = b, /i; — cos 6/) 
parallel to each Galactic coordinate we infer the velocity 
components {U, V, W) in the Cartesian coordinate system 
in which the Sun is at rest at the origin. In this system the 
U axis points to the Galactic centre, the V axis points in 
the direction of Galactic rotation, and the W axis points to 
the north Galactic pole. The relevant transformation is 



(3) 









= M s/ii 







where the orthogonal matrix 

C — sin b cos I — sin / cos b cos / ' 
— sin b sin I cos I cos b sin I 
cos b sin b 



(4) 



The velocity components inferred from distances that have 
fractional error / are 

(v\=m{i + fv)\ sfi, I , (5) 




where I is the identity matrix and 

/I 0\ 
P = 1 . 
Vo 0/ 



(6) 



Hence the true and measured Galactocentric components of 
velocity are related by 

( V]=M{I + /P)M^ f vJ! ) = (I + /T) ( Vo ) , (7) 
\WJ \WoJ \WoJ 

where 

T = MPM^. (8) 

Table [T] gives an explicit expression for T, which has 
direction-dependent off-diagonal elements. Consequently, 
when / 7^ the inferred value of W has linear dependen- 
cies on Uo and Vo with coefficients that are known functions 
of Galactic position times /. By detecting these patterns of 
bias, we can measure the amount / by which distances have 
been overestimated. 

The phenomenon we exploit can be understood by an 
example. Consider a star at a Galactic longitude / — and 
latitude b = 45°. Suppose the star's only non-zero com- 
ponent of velocity (in the Sun's rest frame) is Uo > 0. This 
motion generates both a proper motion < and a line-of- 
sight velocity iiy away from us. If we overestimate the star's 
distance, the tangential velocity, which lies in the {U, W) 
plane, will be overestimated, and we will infer a negative 
value for W instead of zero. By the same token, a star with 
overestimated distance that had Uo < would have > 0. 
In the southern Galactic hemisphere signs reverse and a star 
with overestimated distance at 6 = —45° with Uo > will be 
wrongly assigned a positive value of W. Hence a systematic 
tendency to misjudge distances can be detected by looking 
for correlations between velocity components that vary over 
the sky in given ways. 

The Sun moves in the direction of Galactic rotation 
faster than the circular speed and all Galactic components 
are subject to at least some asymmetric drift, so (Vo) < 
for most groups of stars, especially halo stars. Consequently, 
the clearest signals of an erroneous distance scale are usually 
correlations between the measured values of U and V and 
between W and V. 



2.1 A naive Approach 
Equations ((Tjl yield 

t/ = (1 + fTuu) Uo + fTuvVo + fTuwWo 
V ^ {1 + fTvv)Vo + fTvuUo + fTvwWo (9) 
W^{1 + fTww) Wo + fTwuUo + TwvVo. 
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Table 1. Explicit expression for the matrix T through which distance errors introduce correlations between the apparent components 
of velocity, and an expression for the relation between the errors in (U, V, W) and in {fi, ). 

(1 — cos^ 6 cos^ i — ^ cos^ b sin 2/ — isin2bcos/\ 
-| cos^ f)sin2« 1 - cos^ bsin^ / -|sin2bsin/ 
~isin2fecosi — isin2f)sini cos^ b / 

' — sin b cos Z{1 + f)sti, — sin / (I + f)sei + cos 6 cos iey \ 
:M{I + /P) \ sei 1 = 1 -sinbsin/(l + /)«€{, + cos;(l + /)se; + cosbsiniey 1 

cos b(l + /)sei, + sin ben / 





Suppose for each hemisphere 6 > and 6 < we bin stars in 
I and in 1/ ~ Vb- Then for each bin we could average the first 
and third equations, obtaining for each bin two equations 

{U) = ((1 + fTuu) Uo) + f{TuvVo) + f{TuwWo) 
{W) = ((1 + fTww) Wo) + f{TwuUo) + f{TwvVo). (10) 

We expect the population of stars under study, taken as a 
whole, to be moving neither radially nor vertically, so at any 
(6, 1) the mean values of Uo and Wo should be the reflex of 
the solar motion, {Uq, Wq). With this assumption in the U 
equation we may set 



{(1 + fTuu)Uo) : 
{TuwWo} : 



-il + f{Tuu))UQ 
'{Tuw)Wq, 



(11) 



and similar relations can be used in the W equation. Finally 
we make / the only unknown in equations pop by assuming, 
in a first approximation, that Vo = V. On account of Poisson 
noise, the sample values of quantities such as {TuwWo) will 
differ from our adopted value, —{Tuw}Wq, so the equations 
will not be exactly satisfied, but we can seek the values of / 
that minimise the quantities 



S'u = y2[{U) + {l + f{Tuu))UQ 



(12) 



-f{Tuv V)+f{Tuw)We] 
;^[(l^) + (l + /(rw))W/0 

bins 

+f{Twu)UQ-f{TwvV)]\ 

After determining the optimum value of /, this value can 
be used to correct the distances and the velocities derived 
from them, and a new value of / is then obtained, enabling 
the distances to be corrected a second time, and so on until 
convergence has been reached. 

The scheme just described is straightforward conceptu- 
ally and does work, but suffers a significant loss of infor- 
mation from the need to bin the data and to replace the 
measured values of U and W by —Uq and — Wq. Therefore 
the results shown in this paper are obtained by a different 
scheme that is described in the next subsection. 



2.2 A more effective approach 

Our method is based on the principle that the true value 
of ?7 or W can be decomposed into a mean velocity field 
of known form with components U and W, plus a random 
variable SU or SW that has zero mean, so Uo ~ U + SU, etc. 
In this subsection we make the assumption that the mean 
velocity field may be approximated by the reflex of the solar 
motion, so U = —Uq etc. In Section [2.5.11 we will lift this 



restriction to allow for Galactic rotation. Also we argue that 
in the second or third terms on the right of equations ((9]) we 
may replace Vo by V on the ground that the inferred value is 
close to the true value and the presence in these terms of an 
explicit factor / implies that the error made by replacing Vo 
by V is 0(/^). The same argument enables us to replace Uo 
by U and Wo by W in these terms. With these replacements 
the first and third of equations (|9} become 



U^-UQ + fx + {l + fTuu)SU 
W = -Wq + fy+{l + fTww)5W, 

where 

X = -TuuUq + TuvV + TuwW 
y = —TwwWq + TwuU + TwvV. 



(13) 



(14) 



We now eliminate reliance on prior knowledge of the solar 
motion by subtracting from each of equations (|13p its ex- 
pectation value, and have 

U-{U) = fix - (x)) + (1 + fTuu)5U 
W-{W) = f{y-{y)) + {l + fTww)5W, (15) 

We determine the optimum value of / by forming the sample 
sum^ 

^[t/. - {U) - fix, - {x))]x, = ^(1 -f- fTum)5U,x, (16) 

i i 

- (W^) - fiVi - = + fTww^)SW.yi. 



The right side of the first equation would vanish if 5U were 
uncorrelated with x but it is correlated because x depends 
on V and W , which in turn depend on Uo = —Uq -\- 5U. In 
fact one easily shows that 

((1 + fTuu)5Ux) = /{(I + fTuu){T^v + T^w)SU^). (17) 

Since we are working to 0(/) only, we neglect the second 
term in the first bracket on the right and use the resulting 
expression in equation (I16|l to solve for /. We find 



/ = 



Cov{U,x) 



Y^T{x) + {T^y+T^^)c 



(18) 



where afr = {5U^) and we have identified sample means 
with expectation values. Analogously, from the second of 
equations H16p we have 



^ The accuracy with which / is determined can be sHghtly in- 
creased by weighting each term in the sums in eas.l ll6ll by the 
inverse of the expected standard devia tion of the n oise term or 
by using Huber- White standard errors llWhitelll980l) . 
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Figure 1. Value of the fractional distance error f\ from equation 
II19I I versus value of / bias in a mock sample of 450 000 disc and 
50 000 halo stars. The blue line has unit slope. 



Table 2. Parameters of the mock disc and halo samples used in 
tests. All velocities are in kms~^ 



Component cju cry a^y 
Disc 55 45 35 

Halo 150 75 75 



180 




Cov{W,y) 
Var(y) + {T^^ + T^^X 



(19) 



As with the naive scheme, we proceed iteratively, succes- 
sively correcting the distances according to the value of / 
yielded by the current distances, until / becomes negligible. 
The precise values of the denominators in our expressions 
for / are not important because we rescale distances until 
/ oc Cov((7, x) = or / oc Cov(W^, y) — 0. This circum- 
stance is fortunate as only an approximate values of au and 
aw may be available. Note that equations (|18p and p9p 
make no reference to the solar motion so that in contrast to 
the secular parallax they require only information about the 
shape, but not the average value of the mean velocity field. 

In the following, when using equation (|18|l we shall call 
U the "target variable" and x the "explaining variable", 
while when we use equation (I19|l , W will be the target vari- 
able and y the explaining variable. 

The reader may wonder why we do not obtain a third 
estimate of / from the V equation of the set Q . The prob- 
lem is that we cannot write Vb = —Vq + 5V by analogy with 
our treatment of U and W , because most stellar groups have 
mean azimuthal velocities smaller than that of the Sun, and 
in fact the mean azimuthal velocity of a group will vary with 
location. 

The quantities '^.UiXi and Wiyi implicit in the 
right sides of equations psp and (|19p contain cross-terms 
such as UiVi and UiWi. As explained above, usually 
the V cross-terms contain the largest amount of informa- 
tion regarding /, except when V lies near zero, when the 
W cross-terms provide the strongest constraints on /. is 
the target velocity of choice both because it has the low- 
est velocity dispersion, and because it is least affected by 
strea ming motions, which are largely confined to the UV 
plane (|Dehnenlll998l ). 



2.3 Tests 

In this section we test the effectiveness of the scheme de- 
rived in the last subsection by deriving pseudo-data from a 
model Galaxy, and analysing them in the presence of sys- 
tematic distance errors. We have conducted such tests using 
a model obtained by adding gas and star formation to a halo 
formed in simulations of the cosmological clustering of col- 
lisionless particles. The results of these tests were entirely 
satisfactory, but we do not report them here for two rea- 
sons: (a) considerable space would be required to describe 
the Galaxy model with sufficient precision and the model 
is in any case not entirely realistic, and (b) the model pro- 
vides a rather limited number of particles in the vicinity of 
the Sun, so the statistical precision of the tests is inferior to 
that of the tests we will present. These use data obtained 
from a Galaxy model that is highly idealised, but which has 
the flexibility to produce data that include or exclude what- 
ever features in the data might affect the performance of our 
method. 

Our idealised Galaxy model has a non-rotating halo and 
a rotating disc. The velocity ellipsoids of both components 
are triaxial Gaussians: Table [2] gives the values of the dis- 
persions. The mean rotation velocity of the disc is taken 
to be 180 km s"'^ and the circular speed is 220kms~^. The 
sampled stars are distributed uniformly in distance between 
0.5 kpc and 4kpc, and uniformly in Galactic longitude and 
latitude, which gives the sample a strong poleward bias that 
resembles the bias encountered in real samples better than 
an isotropically distributed sample would. The solar motion 
is in addition offset by the local standard of rest velocity 
vector as determined bv lSchonrich et al] (|2010l ). 

The crosses in Fig. [l]show the value of / recovered from 
equation (|19p on the first iteration, /i, versus the preset 
fractional distance overestimate / applied to the sample. 
Each cross shows an independent realisation of the pseudo- 
data, which contained 450 000 disc and 50 000 halo stars. 
The crosses fall on a curve that passes through (0, 0) as we 
would hope. The straight line through the origin with unit 
slope is also plotted and we see that for |/| <0.2 the slope 
of the curve is close to unity, so convergence of the iterative 
procedure is rapid. However, the key point is that the curve 
passes through the origin and has no point of inflection. So 
long as these conditions are satisfied, the iterative scheme 
will converge on the correct distance scale regardless of the 
slope or curvature of the curve. 

Fig. [2]demonstrates that the method works well even in 
the absence of solar motion by showing results analogous to 
those of Fig. [1] for a sample of stars that has no net motion 
with respect to the Sun and an isotropic velocity distribution 
around the solar motion, i.e. without any systematic offset. 
The minor difference between the estimators on U and W 
derives from second-order effects in / by the polewards bias 
in the sample geometry. Note that a simple linear regression 
ofWony from equation (|13p would give the right zero point 
and so finally an unbiased distance estimate, but due to the 
lack of correction factors to the denominator Var(a;) would 
give a slope that is a factor 2 too large and hence a bad 
convergence behaviour in the iteration. 
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Figure 2. Value of the fractional distance error / from equation 
II18I I or I I19II versus the input value of / for samples of 500 000 
stars with an isotropic velocity distribution and no solar motion. 
The Hne of unit slope is also shown. 



2.4 Impact of random errors 

We now consider the impact on our technique of random 
measurement errors. If we measured U, V and W directly, 
random errors would have no impact because they would 
simply inflate the scatter in these variables that is inher- 
ent in stars having random velocities. Unfortunately, we do 
not measure U, V, W directly but calculate them from the 
measured values of fib, /i; and un. Consequently the error 
in say ni introduces correlated errors into both U and V. 
Since our technique consists precisely in attributing correla- 
tions between U and x (which depends on V) to a non-zero 
value of /, we must consider the contribution of the errors 
to the correlations between U and x ot W and y if we are 
to estimate / correctly. 

Our key assumption is that the errors in nt, /i; and 
D|| are statistically independent, have vanishing mean, and 
have finite and approximately known variances. Let eb, ei, ey 
be the errors in the proper motions and line-of-sight velocity. 
Then the random errors in U, V, W are 



eu \ I set 

ev = M(I + /P) sei 



(20) 



Table [T] gives an explicit expression for the right side of this 
equation. 

Consider now the correlation, between a target variable, 
say W , and the explaining variable y. Let W — W' + ew and 
y ~ y' + Cy, where the primed variables are the components 
without the error and ew, &y are their errors derived from 
equation H20|) 



{Wy) = {W'y') + {ewey) + {W ey) + {ewy')- 



(21) 



Given that the errors are unbiased, the correlations such 
as {W'ey) between the true velocities and the errors van- 
ish. Consequently the changes in / that the errors introduce 
through equation H19I) is 



{ewey) 



WV/'^W 

(4> 



(22) 



The second term on the right side is smaller than the first, 
and as we iterate towards / = it vanishes altogether. Hence 
we neglect it. With this term neglected, we can obtain the 
error-corrected value of / simply by subtracting (ewey) from 
the measured value of (Wy) before inserting its value into 
equation p9p . 

We now calculate the error-error correlations. We have 
from equations p4p 



(euex) = Tuv{euev) + Tuw{euew) 
{ewey) = Twu^eweu) -f Twv{ewev)- 



(23) 



When we use Table [T] to express the errors in terms of the 
(uncorrelated) errors in the observables, we find 

{euev) = \ sin2;[(l -f /)^s^(sin^ hel — ef) + cos^ 6ey] 
(euew) = - 5 sin 26 cos /[(I + ffs^ef - e|] (24) 
(ewev) = — I sin26sin/[(l + f)'^s'^el — e^]. 

From the definition of / we see that these terms exclusively 
depend on the measured distance s' = (1 -I- f)s, so we can 
correct for proper-motion errors before determining /. Fi- 
nally using the explicit form of T from Table [1] we obtain 
our correction terms: 



(euex) = —j{cos^ bsin^ 21 [(s'^(sin^ bel — ef ) + cos^ 6ey 

-sin^ 26cos2/[s'^ei -ej|])} 
(ewey) = J sin^ 2b{s'^el - ef,}. 



(25) 



The left panels in Fig. [3] show (ewey) as a function of 
the errors in the line-of-sight velocities (upper panel) and the 
errors in proper motions (lower panel). (In the upper panels 
the proper-motion data are error- free, while in the lower 
panels the line-of-sight velocities are error-free.) All points 
are determined from realisations of a Monte-Carlo sample 
of 450 000 disc stars and 50 000 halo stars sampled from 
the model described by Table (2] The agreement between 
the analytic formula and the Monte-Carlo results is perfect. 
On account of the large distances of most of the stars, the 
proper-motion errors produce substantially greater values of 
(ewey) than do the errors in (which will be negligible for 
most present-day samples). The right panels of Fig. |3] show 
the shifts in /i (red crosses) that arise from the correlations 
plotted on the left. The uncorrected values of /i exhibit 
a quadratic behaviour for small errors as can be expected 
from equation (|25p . while for larger errors growth in the 
denominator on the right of equation (|19|l abates the growth 
in |/i|. The blue crosses show the values for /i obtained 
when we correct our estimate according to equation (|25p . 
The green squares in the bottom right panel depict the case 
when we vary cr^^ at a fixed line-of-sight velocity error a^^^ ~ 
30kms~^. This demonstrates that the error effects can be 
added linearly and our formalism gives a perfect account of 
them (purple circles) . We also checked that as predicted ct^, 
does not affect our distance estimate when targeting W. 

The largest uncertainty in the corrections given by 
equations (|25p lies in the assessment of the measurement er- 
rors. The model data used above include remote disc stars, 
which have errors that are larger than will often be encoun- 
tered in practice. So this test suggests that it should be 
possible to correct for the effects of measurement errors in 
most samples. 
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Figure 3. Tests of the effects of random measurement errors in a moclt sample of 500 000 stars, among wliich are 50 000 halo stars. The 
left and centre panels show how the values of Cow(W,y) and Var(j/) are affected by measurement errors, while the right panels show the 
values of /i using eguation llf 9|l with and without the correction terms for these samples, fn the upper row the proper motions are error 
free and the horizontal axis gives the error in , while in the lower panel t^y is error free (red and green crosses) and the horizontal axis 
gives the error in proper motions. In the bottom right panel we added the case of a fixed radial velocity error of 30kms~^ (green squares 
and purple circles) to demonstrate the simple superposition of the error correlation terms on the covariance. 



velocity ellipsoid's long axis and the Sun-centre line is given 
Gal. N. Pole bv 




Figure 4. The definition of Galactic coordinates, heliocentric ve- Both observatio n (Sicbcrt elalj l201lD and theory 

locities and the angles a and /3. GC signifies the Galactic Centre. (jBinney fc McMillan| |2011,) suggest that a and /3 will take 

The purple ellipse depicts the direction of the radially oriented values close to the Galactocentric azimuth and latitude 

main axis of the velocity ellipsoid (along C/g), which defines /3. iyr — ^ of the location in question. In our tests we will as- 

sume that these relations are exact. 



2.5 Rotation of the velocity ellipsoid 

Regardless of a star's location, we have been decomposing 
its velocity into Cartesian components in the frame that 
is aligned with the Sun-centre line. Since this frame is not 
aligned with the principal axes of the velocity ellipsoid at 
the location of a distant star, we anticipate non-vanishing 
values of IJJV), etc., even in the absence of distance errors. 
We now address this issue. 

Let the components of velocity of any star along the 
principal axes of its local velocity ellipsoid be (C/g, Vg, Wg). 
Then with the angles a and /3 defined as shown in Fig. IH 
the angle a between the projection onto the plane of the 



2. 5. 1 Correlations from mean streaming 

A major contribution to the velocity components U and V 
comes from the azimuthal streaming of stars, which we take 
to have magnitude v,p{R, z). This motion invalidates our as- 
sumption above that U = —Uq + SU. Instead we now have 
U = —Uq + U + 5U, where U is the U component of the 
velocity field given by at the star's location. Specifically 
we have 

U{s,l,b) = Vtf,sina, (29) 

where there is dependence on {s,l,b) both through a and 
through the (generally unknown) dependence of on 
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{R, z). Unfortunately, both Tuv and U are odd functions of 
I, so correlations contributed by distance errors can be in- 
terpreted as due to differential rotation and vice versa. On 
account of this fact, W , to which does not contribute, is 
a more useful target velocity than U . However, it is nonethe- 
less worthwhile to consider how U can be targeted. 
The first of equations (|13p now becomes 

U ^-UQ+v^sma + fx' + {1 + fTuu)SU. (30) 

To determine / from these equations we assume that 

V4,^eg{R,z), (31) 

where g{R, z) is a function that describes the way in which 
varies with position and Q = v^{Ro,0) is the local 
streaming velocity of the population under study. In the 
simplest case we assume that g has no dependence on R, 
and we estimate its dependence on z from the data, using 
the current distance scale. Once g has been chosen, and a 
preliminary value for O adopted, we can determine the value 
of X for each star. We primed x in equation H30p because x 
contains the mean motion in U, so we have to split off the 
rotation term: 



c' = x + TuuegiR,z) sm a. 



(32) 



The distance error / causes the measured a' to deviate from 
the true value a, but we can correct for this effect by Taylor- 
expanding a'(/): 



sin Q = sin a — f cos'' a' - 



s'i?„sin/cos6 o(/^). (33) 



{Ro — s' cos I cos b)'^ 
Then 

U = ~Uq + ep + fx + fQk + (1 + fTuu)5U, 
where 

p = g{R, z) sin a 
k = Tuup + cos"' a 



s'Ro sin I cos 6 



(-Ro — s' cos I cos 6)2 



(34) 



(35) 



Now we can proceed identically to the derivation of equation 
(|18p : we first subtract from equation (|34p its expectation 
value to obtain 

u-{u)-e{p-{p))-f{x-{x)) 

- fe{k - (k)) ^ {1 + fTuu)SU, (36) 

and then we multiply Xi and pi and sum over our sample. 
Introducing the abbreviation Sa,b = Cov(a, b), this gives two 
equations for the unknowns / and O: 



sux — Qspx — fsxx — Qfskx ~ .f{Tuv + Tuyir)a'lr 
SUp - QSpp - fSpx - 6/Sfcp = 



(37) 



Inserting O from the second equation into the first and drop- 
ping all terms of order we obtain our estimator 



^ px ~\~ ^kx^U p ^Ux^kp ~\~ ^^^Jj^pp 



(38) 



where = {T^y + T^w)- a quick calculation the third 
and fourth term in the denominator can be neglected as they 
are in general small and only affect the slope. Again we solve 
these equations iteratively, at each iteration updating the 
distances and recalculating for each star x, a and g. 
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Figure 5. The effect of including the corrections for rotation of 
the velocity ellipsoid. Two samples are used: one has just 500 000 
disc stars and the another has 450 000 disc plus 50 000 halo stars. 
Both samples are strongly affected by rotation of the velocity 
ellipsoid, yet the correction successfully shifts the points so they 
pass through the origin. As expected, the sample with a halo 
contribution is less strongly affected. 



2.5.2 Correlations from random velocities 

In the heliocentric frame the random component 5U = Uo + 
Uq — U is correlated with SV = Vo + Vq — V because the 
velocity ellipsoid at the star's location is not aligned with 
that at the Sun's location, so the rotation matrix R(a, j3) of 
equation (|28p is non-trivial. Consequently, when we calculate 
(UTuvV) in the course of evaluating (Ux), the correlation 
will be larger than the one we want by (SUTuvSV) . We 
now determine the magnitude of this correlation so we can 
subtract it from the correlations we obtain from the data 
prior to determining /. Bearing in mind that {SUgSV^} — 0, 
we have 

(SUTuvSV) = {{SUg cos a cos /3 -f 5Vg sin a + 5Wg cos a sin /?) 
xTuvi—SUg sin Q cos /? + SVg cos a — SWg sin a sin /3)) (39) 

= ^(cos^ 6sin 2/ sin 2q(cos^ /?a[/ — ay + sin^ /3o-^-)) 
Similar calculations yield the additional correlations 
(SUTuwSW) = i (sin 2/3 sin 26 cos /cos a (ct^ - a^)) (40) 
and 

(SVTywSW) = i{sin26sinZsinasin2/3(crw- - cr^)) (41) 

The red and blue points in Fig. [5] show what happens if 
one ignores the impact of azimuthal streaming and rotation 
of the velocity ellipsoids when determining / by plotting 
on the vertical axis the value of / that is recovered from 
equation p8[) against the input value of /. The red points 
do not pass through the origin, so the estimated value of 
/ is non-zero even when the distances are, in fact, correct. 
The green and blue points show that when the formulae 
above are used to subtract the contributions to the measured 
correlations from velocity-ellipsoid rotation, the points pass 
through the origin as we require. The mock data used in 
these tests consisted of 450 000 disc stars and 50 000 stars 
belonging to a non-rotating halo in one case and a pure disc 
sample of 500000 objects in the other case. For one test case 
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only the disc stars were used, while all the stars were used 
in the other case. 



2.6 Components with extreme velocities 

Samples of halo stars generally have large mean V velocities 
relative to the Sun. So long as we are confident that the 
sample means of Uo and Wo are far smaller than that of V , 
we can greatly simplify the analysis of the sample. While 
some samples of high-velocity stars may show a degree of 
radial streaming on account of the Hercules star stream, the 
only indication of streaming in the vertical direction is a 
very small correlation between V and W that was de tected 
in the Hipparcos proper motions by iDehnenI l|l998l ), and 
interpreted by him as the signature of the Galactic warp. 
We proceed under the assumption that {Uo) = —Uq and 
{Wo) = -Wq. 

At each point on the sky we imagine taking the sample 
mean of the third of equations ((9| to obtain 



{W) + (1 + ,fTww)WQ + fTwuUe = fTwv{Vo) 



(42) 



On the left we neglect terms of order / and redetermine / 
as the value which gives the least-squares fit between the 
functions of sky coordinates {W) + Wq and Twv{Vo) — 
Twviy). The formal error in the recovered value is 



(43) 



where N is the number of bins on the sky. For a typical sam- 
ple o-T„v ~ 0-2, and for halo stars we have {V) ~ 250 km s~ 
and aw ~ lOOkms"^, so e/ ~ 2/y/N, which gives an error 
in f ef ~ 6.3% for a sample of 1000 objects. We can reduce 
this error by using the corresponding equation for (U) - the 
reduction is by a factor slightly smaller than y'2 because 
au > o'w- 

If initially our distance scale is significantly in error, 
our first values of (V) will be wrong. The magnitude of the 
problem is given by the first term on the right of the second 
of equations ®: 



(V) 



1 + f{Tvv} ' 



(44) 



where the angle brackets around Tvv imply the average over 
the surveyed region of the sky. Eliminating (Vb) between 
equations (|42|l and (144 |l . we obtain 



(W) + Wq 



f 



1 + fiTvv) 



Twv{V). 



(45) 



It is now straightforward to determine / from the mean slope 
X of the correlation between {W) + Wq and Twviy): 



f 



(46) 



This simple-minded approach to the determination of / 
uses the available information less efficiently than the tech- 
nique described in Section 12.21 but it is a good way of de- 
tecting a systematic distance error and its sign prior to it- 
erativel y correcting the distan ce scale. This is the approach 
that led lSchonrich et"al] ()201lh to suggest that the distances 
to low-metallicity stars in the SEGUE dataset were being 
systematically overestimated. 
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Figure 6. The effect of an unbiased Gaussian distribution of 
distance errors in a sample of 450 000 disc and 50 000 lialo stars. 
For comparison we plot the fits via U and W for the corrected 
fitting formulae. At larger standard errors of the distances, the 
average distance estimate diverges quadratically. For W we show 
the fitting line 0.453cr^ obtained for 0.0 < C7f < 0.4. Beyond 
this point the error loses Gaussianity because we have to cut the 
Gaussian distribution in order to avoid negative distances. 



3 SCATTER IN THE DISTANCE ERRORS 

To this point we have assumed that the distances to all 
stars contain the same fractional error, /. In reality any 
systematic offset will be combined with random scatter, and 
we now consider whether in these circumstances the factor 
/ that we recover from the whole population will equal the 
average of the / factors of the stars. In other words, does 
our procedure provide an unbiased estimate of /? 



3.1 The bias in / 

In fact it is not hard to see from equation (|18p that we 
must anticipate a tendency to overestimate the mean value 
of /: stars with / > will be ascribed the largest velocities 
and will thus tend to dominate the sums implicit in {Wy} 
and (j/'^). From the perspective that equations (|13p describe 
linear relations between U and x or W and y, stars with 
overestimated distances will dominate the ends of the line 
and influence more strongly our estimate of the line's slope / 
than stars with under-estimated distances, which will cluster 
near the middle of the line. 

Fig. [6] shows this effect in samples of 450 000 disc and 
50 000 halo stars in which the input distances have errors 
/ that have zero mean but the dispersion af that is given 
by the horizontal axis. The vertical axis gives the recovered 
value of /. To both cases we apply the corrections described 
in subsections 12.41 and 12.51 The expected tendency for / to 
be overestimated in the presence of significant scatter in the 
input / values manifests itself in the parabolic shape of the 
curves formed by the corrected results. 

We can recover this behaviour analytically as follows. 
We assume that the stars with a given fractional distance 
error /' occur everywhere on the sky, so we can form the 
sky-average (Wy) fi over just this group of stars. Defining 



2 / rT~i2 I r7~i2 \ 2 

n = (iwv + J-wu )'^w J 



(47) 
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the inferred fractional distance error of the population is 

Ji'Pfif'){Wy}r 



f 



I?Pf{f'){y^ + n^)f' 

J?Pfif')f'{y'}f' 

Jf'Pf{f'){y^ + n^)f' 



(48) 



where Pf{f') is the probability density function (pdf) of /'. 
Now we we decompose (y^) into the part {y'^)± that derives 
from tangential velocities and the part (j/^) y that derives 
from line-of-sight velocities. Since the inferred tangential ve- 
locities scale like 1 + / we then have 



/ 



JfPf{nf'[{i + fr{y'}f'± + {y')ni] 

J f'Pf{f'm + f'r{y^+n^)f'± + {y''+n^)f' 



(49) 



Setting Pf oc e 
with /', this yields 

/ 



and neglecting the variation of (j/^) y/ 



2a-i 



l + + (n2)i/(y2)x + {y^ + n^}n/{y^}^ 



(50) 



The parabolic variation of the recovered value of / with 
the width cr/ of the scatter in individual /-values is now 
manifest. 

Actually, the assumption above of a Gaussian distri- 
bution of fractional distance errors is not fully realistic. In 
fact, the Pfif) has a long tail at / > 0. Stars in this tail 
will have seriou s ly ove restimated tangential velocities, and 
ISchonrich et al.l (|201ll ) argue that as a consequence a halo 
sample that in reality has no net rotation can be interpreted 
as consisting of two populations, one of which is counter- 
rotating. 



3.2 The second moment of the error distribution 

We can obtain information about the breadth of the distri- 
bution of distance errors in an approach largely similar to 
the classical statistic parallax: we compare the square of the 
speed i; with the squares of the line-of-sight velocity . For 
the measurement of the ith star, we have 



2 2 I 17)2 2 



(51) 



where Fi = 1 + ft. Summing over the A'^ stars in the sample, 
we obtain 



1 + 



1 + 



F^j:^ho+E_i_p' - p')^ho 



^l+l?2!^ + ^Cov(F^^;2 



(52) 



If the distance errors are statistically independent of veloc- 
ities, the covariance vanishes. Further, if either the veloc- 
ity distribution is isotropic or the sample is uniformly dis- 
tributed on the sky so v±_ and v\\ sample equally all three 
principal axes of the velocity ellipsoid, then = 2ii? and 



equation (|52p yields 



(isotropy) . 



(53) 



If the velocity distribution is anisotropic and the sky cov- 
erage is non-uniform, this formula will under-estimate F'^ 
when the sample points towards the longest axis of the ve- 
locity ellipsoid and overestimate it in the contrary case. 

Classical statistical parallaxes are obtained under the 
assumption of isotropic velocity dispersion, which is the cir- 
cumstance in which equation (|53p is most likely to hold, and 
clearly this equation is closely related to the classical formula 
for a statistical parallax. The main difference is that is yields 
the second rather than the first moment of F. 

The covariance in equation (|52|) is non-vanishing when 
the distance errors are not statistically independent of the 
velocities, for example, because distances are more likely to 
be under-estimated when looking into the plane than when 
looking to a Galactic pole. 

In practice the scope for reliable application of equation 
(|52|l is limited since few samples are uniformly distributed 
on the sky and have securely known values of the covariance 
term. 

A more effective way to determine the scatter in / ex- 
ploits the idea introduced at the start of this Section that 
stars with overestimated distances tend to have large val- 
ues of |a;| and \y\, while stars with under-estimated dis- 
tances have small values of \x\ and \y\. Consequently, if in 
equations (|18[) or (I19|) we restrict the sum to stars with 
small (resp. large) x'^ or y^ we will probe the smallest 
(resp. largest) values of / within the sample. By combining 
these estimates of / with the numbers of stars associated 
with each range of values of x'^ or y^ , we can construct the 
probability distribution P{f) of the overall sample. 

Whichever approach we adopt to determine the scatter 
in /, we should take into account the errors in proper mo- 
tions. In the first approach they increase v\ and v'^, in the 
second approach they push stars to large vales of x"^ and 
and thus affect the distance estimator. Fortunately, in rela- 
tively nearby samples the impact of errors in proper motions 
is limited. 



4 IMPLEMENTATION 

In this section we explain which of the several formulae we 
have given for the fractional distance error / we recom- 
mend, and in what order they should be used. Then we illus- 
trate the procedure by applying it to a sample of stars from 
the Sloan Exten sion for Galactic U nderstanding and Explo- 
ration fSEGUE. lYannv et al.llioogi ') and a sample fro m Data 
Release 8 of the Sloan D igital Sky Survey (Eiscnstci n et al.l 
I2OIII : lAihara et ai]|201ll ) . 

In any real data set there are likely to be stars with 
implausibly large heliocentric velocities, and the first step 
should be to discard those stars. We discard stars for with 
extreme galactocentric velocities, i.e. |C/|, |\^| > 800kms~^ 
or \W\ > 400 kms^^. Then we bin the stars by some quan- 
tity of interest, such as surface gravity, metallicity or value 
of v^, and for each bin use equation (|19p iteratively to de- 
termine / for that group. The values of (Wy) used in this 
equation are the raw values from the data minus the cor- 
rections (ewey) from equations (|25|) and the corrections for 
rotation of the velocity ellipsoid from equations (|40p and 
(|4ip . Once this stage in the analysis has been completed, 
the distances of stars have been corrected for the most im- 
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Figure 7. Values of / for the sample of ~ 20 000 main-sequence 
stars in the sample of CaroUo et al. (2010) (red curve) and three 
different selections on our SEGUE sample. A mask with a width 
of at least 1600 stars and at least 25kms~^ width was moved 
over the sample in steps of 200 stars. The dotted lines delimit 
the formal la error bands associated with these estimates. In the 
lower panel we plot the distance corrections from proper motions 
(negative, solid lines) and the velocity ellipsoid turn (positive, 
dashed lines) as defined in eq. 1401 

portant errors in the original data, and we may assume that 
any residual systematic errors are small. 

4.1 Used samples 

We will make use of two subsamples from the SEGUE sur- 
vey. Our main sample consists of a raw data set of 224 019 
stars from the eighths Sloan data release (DRS. lAihara et al.l 
As we want the maximum number of stars we can 
get and not a specific subset and do not fear metallicity bi- 
ases in the sample, we use all stars with clean photometry 
from target selection schemes that do not include any direct 
kinematic (i.e. proper motion or line-of-sight velocity) bias. 
To ensure d e cent q uality of the used kinematics, we follow 
iMunn et al.l (|2004l ) in requiring a match in the proper mo- 
tion identifications {match = 1), a good position determina- 
tion (J_RA, o-DEC < 350 mas. To ensure sensible stellar param- 
eters, we require an average signal-to-noise ratio larger than 
10. Further we require the formal errors on the proper mo- 
tions to be moderate: (J^(,,cr^, < 4 mas yr~^. We exclude any 
star that lacks a metallicity or a proper motion or is flagged 



as having an unusual spectrum l|Lee et al.ll2008al ) unless the 
flag indicates carbon enhancement. To eliminate a handful 
of objects with colours far outside the normal calibration 
ranges we require < {g — i)o < 1- Only stars that would be 
within 4 kpc in the first guess distance determination and 
pass our criteria fo r not being velocity outliers are used. 
When adopting the llvezic et al. I lioOS) (A7) mam-sequence 
distance calibration a total of 119 577 stars p ass th ese cuts. 
Velocities are derived as in ISchonrich et al.l (|201ll ) with an 
adopted solar galactocentric radius of Tio = 8 kpc, a circular 
speed of 220 kms~^ a nd the solar motio n relative to the local 
standard of rest from ISchonrich et al.l (2010). For this work 
we make use of the dereddened Sloan photometric colours 

provided in the cat alogue. 

The sample of ICaroUo et all (|2010l ). which comprises 
~ 30 000 calibration stars from SEGUE, constitutes our 
second sa mple. Its parameters d erive from an earlier ver- 
sion (DR7, lAbazaian et al.ll2009l ) of the SEGUE parameter 
pipeline, but are consistent with the new data release. While 
their sample is no more than a mere subset of our larger 
sample, their sample suffers from distance overestimates (as 
shown by Schonrich et al. 2011) whose re-detection illus- 
trates the potential of the method presented here. 

4.2 Mapping the samples in azimuthal velocity 

Fig. [7] shows the results of binning the main-sequence stars 
of four subsamples of SEGUE stars by azimuthal velocity 
(with the Sun at = 232kms~^). The upper panel shows 
values of /, while the lower panel shows the corrections used 
to obtain these /-values. 

The full lines in the bottom panel of Fig. [7] show the 
corrections to / that are required to account for proper- 
motion errors - the impact of errors in W|| is negligible and 
not plotted. Proper-motion errors tend to increase the recov- 
ered value of /, so they require a negative correction to /. 
Their importance peaks around solar velocity because they 
contribute a roughly constant term to Cav{W,y), while the 
typical heliocentric velocities of stars, which provide our sig- 
nal, shrink as tends to the Sun's value both because of 
the diminishing offset in the rotational component and be- 
cause the velocity dispersion of disc stars diminishes as 
approaches the circular speed. 

The dashed lines in the bottom panel of Fig.Qshow the 
corrections to / that are required to account for the rota- 
tion of the velocity ellipsoid. These curves have two peaks 
because there is a similar competition between decreasing 
heliocentric velocities and decreasing size of the velocity el- 
lipsoid that drives the correction term. 

The full red curve in the upper panel of Fig. [7] shows 
the values of / yielded by equation (|19|l when distances from 
CaroUo at al. are used for their "main-sequence" stars; the 
dashed red curves show the error bounds on /. We see that 
/ is significantly greater than zero for < 0, the region of 
retrograde rotation, implying the presence of significant dis- 
tance overestimates. At > Q, f drops slightly below zero. 
The full green curve shows the corresponding values of / for 
the full SEGUE samp le when distances are obtained from 
the llvezic et al.1 (|2008l ) (A7) main-sequence relation. Since 
the samples are now much larger, the formal error bounds 
are tighter than in the case of the CaroUo et al. sample. 
Now at Vrj, > 0, f is decidedly negative (~ —0.3) implying 



The detection and treatment of distance errors 11 



the presence of significant distance under-estimates. The /- 
value of a sample is an average distance correction, so a given 
value of / could imply that all stars have the corresponding 
distance mis-estimate, or that a fraction of the stars have 
a larger mis-estimate while the bulk of the stars have good 
distances. The blue full curve in the upper panel of Fig. [7] 
shows the values of / obtained when the all-star sample is 
restricted to dwarfs by imposing the restriction logt; > 4: 
with this cut the distances are under-estimated by only ~ 10 
per cent because the gravity cut eliminates most sub-giants 
and giants from the sample. The black line shows the same 
"dwarf" star sample with the additional restriction for the 
primary distance estimate to be d! < 2kpc. This cut re- 
moves mostly relatively blue stars that have a tendency to 
be on the blue side of the turn-off point. And as we can see 
from the black line in the lower panel the impact of proper 
motion errors is greatly reduced as these are proportional to 
the square of the estimated distance. 

In light of this finding we conclude that the deep trough 
in the green curve for the all-star sample arises because that 
sample is severely contaminated by subgiants and giants. 
We can probe the extent of the contamination by dissect- 
ing a sample in velocity space because, as we saw in Section 
13.11 stars with overestimated distances assemble at extreme 
velocities, while stars with under-estimated distances are 
dragged towards the solar motion. This is why the curves 
of the two contaminated samples (the CaroUo et al. sample 
and the all-star sample) slope steeply downward from left 
to right in the upper panel of Fig. [T] The slope of the curve 
for the cleaner sample produced by the gravity cut is much 
smaller. We can even in analogy interpret the minor differ- 
ence between the black and the blue curves: by the general 
inclination of the main sequence, the distance cut preferen- 
tially removes relatively bright blue stars from the sample 
that have a larger spread in estimated distances. 

The sudden rise of / at super-solar has a different 
cause. These stars are few in number and have small helio- 
centric velocities, and, as the lower panel of Fig. [7] shows, 
their /-values are strongly affected by assumed proper- 
motion errors. It is likely that our probably false assumption 
of a constant proper-motion error has biased the /-values for 
these stars. By contrast, the /-values of stars with sub- 
stantially smaller than the solar value are insensitive to the 
handling of proper-motion errors. 

While valuable insights can be obtained by examining 
/ as a function of velocity, a word of caution about such 
dissection is in order. No cut or selection should directly 
affect the target variable (here W), and cuts on the explain- 
ing variable can introduce artifacts that should be explored 
with mock data. For example cutting in the heliocentric V 
velocity instead of introduces a velocity-dependent error 
in / of order ~ 5 per cent. In this case the bias arises from 
the rotation of the velocity ellipsoid, which from selection 
in V creates a bias in U, which in turn evokes biases in W 
through the vertical tilt of the ellipsoid. 



4.3 Dissecting the main sample in gravity 

By partitioning a sample in gravity we can explore the extent 
to which a sample contains stars at different evolutionary 
stages since they should fall into different bins in gravity. In 



the fo llowing we use only the A7 calibration of llvezic et al.l 
(|2008l ). 

The first three panels of Fig. [S] show results obtained by 
splitting the ~ 120 000 stars into three ranges of metallicity, 
with boundaries at [Fe/H] — —1.2 and —0.5 and then within 
each metallicity group splitting the stars in log </, and finally 
binning them in colour. Each colour bin has > 1200 stars 
(the ones at the edges carry > 800 and > 400 objects), and 
from one bin to the next 400 stars are dropped, so every third 
data point is independent. Points are plotted at the average 
colour of the stars in the bin. Most giant stars in this sample 
have low measured metallicities so the low-gravity bins are 
only well-populated for the most metal-poor stars. The error 
bars indicate the formal errors on / plus an error of 30% in 
the corrections to / for proper-motion errors and rotation 
of the velocity ellipsoid. 

Since the distances employed assume that every star is 
on the main sequence, giants have severe distance under- 
estimates (negative /). In the top two panels of Fig. [8] one 
can assess the colour at which stars move up from the sub- 
giant branch to the giant branch - the precision with which 
this colour can be determined is increased if the sample is 
not divided by gravity or metallicity. The distance under- 
estimates indicated by Fig. [S] are similar to those we would 
expect a priori, but the agreement is imperfect because the 
giants in this sample are very remote, so proper-motion er- 
rors have a big impact on kinematically determined dis- 
tances. 

In the literature SEGUE stars with gravities within 
3.0 < logg < 3.5 are considered subgiants (see e.g. 
ICarollo et all bold ). Stars with 3.5 < logg < 4.0 were clas- 
sified as turn-off stars until it was shown bv lSchonrich et akl 
201lf ) that this practice sorts stars into unphysical positions 
in the colour-gravity plane (at the relevant low metallicities, 
the turn-ofi' region s hould end bluewards of (q — i)o < 0.4) . 
More recent studies l|Beers et al.ll201ll : [Carollo et al.ll201ll ) 
classify the stars with 3.5 < logg < 3.75 as subgiants 
and the higher-gravity objects as main-sequence stars. How- 
ever, the purple and blue points in the upper two panels of 
Fig. |8] show that it cannot be the case that all stars with 
logg < 3.75 are subgiants, both because at (g — i)o > 0.4 
the /-values of the stars with 3.5 < logg < 3.75 are signifi- 
cantly less negative than those of stars with logg < 3.5, and 
because the /-values of the high-gravity sub-sample are no 
smaller than ~ —0.3. This corresponds to their being more 
luminous than main-sequence stars of the same colour by 
less than a magnitude, whereas, depending on metallicity, 
already at (g — i)o ~ 0.4 subgiants should be more luminous 
than main-sequence stars by more than 1.5 magnitudes. We 
conclude that no reliable selection for subgiants is feasible 
with the current gravities: in general there is a contamina- 
tion by dwarf stars (with the well-known effects of distance 
overestimates mimicking kinematically hot retrograde pop- 
ulations) and at least on the red side we have to expect some 
contamination by giants. 

The main-sequence relation appears to describe rela- 
tively well the distances of stars with measured logg > 3.75. 
Yet, especially in the top right panel of Fig. [8] we see that 
for all metallicity subsamples, / tends to increase bluewards. 
This phenomenon arises because the colour-luminosity rela- 
tion we have used is inclined relative to the theoretical zero- 
age main sequence and assigns quite high luminosities and 
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Figure 8. An evaluation of the performance of gravities in SEGUE DR8 and the performance of the Ivezic et al.(2008, A7) calibration 
against metallicity and colour. Each of the first three panels shows results for stars in a restricted range of metallicity. Each metallicity 
group was then divided by surface gravity and g — i colour and / determined for that group from equation 11911 . We move a 1200 stars 
wide mask over the sample in steps of 400 objects, so that every third data point is fully independent. Error bars give the formal error 
plus a 30% error on the systematic corrections. The bottom right panel shows the corrections made to the /-values of high-gravity stars 
of various metallicities for proper motions (solid lines) and the turn of the velocity ellipsoid (dashed lines). 



consequently large distances to blue stars relative to their 
red counterparts. 

Fig. [S] enables us to choose the lower limit on gravity 
that will most effectively minimise contamination of the fi- 
nal sample by stars that are not dwarfs. This limit appears 
to rise from about \ogg ~ 4.1 at the lowest metallicities 
to \ogg ~ 4.4 at the highest metallicities. Some part of the 
trend may also be connected to the redward shift of the turn- 
off with metallicity. However, this conclusion should not be 
blindly transferred to catalogues other than DR8 because in 
this parameter derivation measured gravity is likely corre- 
lated with metallicity, so we may to some extent see map- 
ping errors in assumed luminosity that arise from errors in 
metallicity. Fig. [S] also enables us to detect the redward shift 
with increasing [Fe/H] in the turn-off colour as the colour 
at which the dark-blue points of lower-gravity stars become 
clearly separated from the black points of dwarf stars. Also, 
blueward of the turn-off we expect an increased spread in 
values of / within the highest-gravity bins, as the SEGUE 
stellar parameter pipeline retains some residual information 
on how high above the main sequence a star is placed in 
gravity. 



The bottom-right panel of Fig. [S] shows the corrections 
to / required by proper-motion errors (solid lines) and ro- 
tation of the velocity ellipsoid (dashed lines). The impact of 
proper-motion errors on the most metal-weak stars is small 
because these stars are in the halo and have large heliocen- 
tric velocities. Rotation of the velocity ellipsoid has similar 
impact on stars of all metallicities because these stars are 
distributed through broadly the same volume and a higher 
velocity dispersion both inflates the correction and the signal 
in /. For this plot velocity errors were calculated by mea- 
suring the dispersions in each subsample and then assuming 
constant velocity dispersions in the lowest metallicity bin 
and in the other metallicity bins an increase of the disper- 
sion by 15 per cent for each kiloparsec in \z\ and assuming 
that {(7^)^/^ oc exp{-R/R^) with Ra = 7.5 kpc. This cor- 
rection term is small and minor changes in how it is derived 
will not alter our results. 



5 CONCLUSIONS 

Systematic distance errors give rise to correlations between 
the measured components (U, V, W) of heliocentric veloc- 
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ities. Similar correlations arise from three other sources: 
(i) measurement errors in the proper motions, (ii) Galactic 
streaming motions and (iii) dependence of the orientation of 
the local velocity ellipsoid on position in the Galaxy. How- 
ever, each of these sources of correlation between ([/, V, W) 
has a different and known pattern of variation over the sky, 
so provided the data come from a wide-area survey, we can 
disentangle their effects. We have described an iterative pro- 
cedure by which the distances to stars are rescaled until 
correlations between {U, V, W) are fully accounted for by ef- 
fects (i) - (iii) above, and the contribution from systematic 
distance errors vanishes. 

The procedure works best when the group of stars under 
study has a large net motion with respect to the Sun. This 
net motion is often dominated by azimuthal streaming, since 
the Sun has more angular momentum than a circular orbit, 
while nearby thick-disc and halo stars tend to have substan- 
tially less angular momentum. When azimuthal streaming 
is dominant, the simpler formulae of Section [2.61 applv. For 
stars in the thin disc that have similar angular momentum 
to that of the Sun the other velocity components usually 
still carry sufficient information to assess distances. 

In principle we can determine distance errors by using 
either U ,V or W as a. "target" variable, with the "explain- 
ing" variable being composed of the other components of 
velocity and the sky coordinates. In practice V should not 
be used as a target variable as the systematic variation of 
V velocities with position in the Galaxy would invoke spu- 
rious correlations with the angle terms connecting it to the 
explaining velocity components. W is the target variable of 
choice both because it has the smallest velocity dispersion 
and because it is least affected by the complexities of differ- 
ential rotation. U is mainly useful as a target variable for 
its ability to determine the mean rotation rate of a popula- 
tion once the distance scale has been corrected by exploiting 
W . We will discuss an application to this rotation term in a 
forthcoming paper. 

There are some restrictions on the applicability of the 
method that should borne in mind when using it. The proper 
motions need to be unbiased and their errors should have fi- 
nite and approximately known variances. Thes e conditions 
seem to be satisfied by data from the SDSS l|Dong et al.l 
[20l3). If the sample is non-local we need to estimate the ex- 
tent of rotation of the velocity ellipsoid within the sampled 
region. Such an estimate can be obtained from the sam- 
ple itself, but with some residual uncertainty arising from 
proper-motion errors that particularly affect remote stars. 

Streams and a warp will induce unwanted correlations 
but the likelihood of these giving rise to an erroneous dis- 
tance scale is small for several reasons. First, for a stream 
or warp to undermine the method, the correlations it in- 
troduces must vary on the sky in a similar way to the cor- 
relations associated with distance errors. Consequently, the 
impact of a stream or warp is likely to be suppressed given 
sufficient sky coverage. Second, a warp could be accounted 
for in much the same way we have accounted for Galactic ro- 
tation. Third, the footprints of streams or a warp will show 
up in conflicting values for / obtained from the two possible 
target velocities, W and U . Finally, a stream or warp would 
induce identical correlations in the velocities of stars in the 
broad colour and gravity range that made up the physi- 



cal feature, whereas distance mis-estimates will usually vary 
with spectral type. 

In Section 13.11 we showed that the estimators given in 
Section [2] are mildly biased in the sense that when there is a 
scatter in the distribution of distance errors, the estimated 
value of / will be larger than it should be by an expression 
quadratic in the width of the distribution. Equation (|50[) can 
be used to correct for this effect. As we discussed in Section 
13.21 the method can be extended to probe the full probability 
distribution of / values rather than just determining the 
mean value of /. Details of this extension will be given in a 
later paper. 

In Section |4] we applied the method to samples of stars 
from the SEGUE surv ey. We c onclu ded that the distances 
to stars used by Ca roUo et al.l (|201Cll ) are on average signif- 
icantly overestimated among stars deemed to be counter- 
rotating, and tend to be under-estimated by ~ 10 per cent 
near solar velocity. This is also a nice example on how a 
spread in the distance errors within a sample can be directly 
seen by eye, when we dissect the sample in velocity: the dis- 
tance overestimates assemble at velocities remote from the 
solar value (i.e. mostly the retrograde tail of the halo ve- 
locity distribution), while in our all-star sample, which is 
contaminated by numerous giants, the giants are dragged 
towards the solar motion, so a trough forms in a plot of 
the correction factor / versus azimuthal velocity. We also 
warn against mistaking the derived values for a direct es- 
timate of absolute magnitudes. Apart from contaminations 
the method corrects for the mean reddening error and the 
presence of unresolved binaries. So in general it can be ex- 
pected to give slightly larger mean distances than appropri- 
ate for single stars. 

In Section [4.31 we demonstrated the use of the method 
to assess the reliability of the Ivezic (2008) A7 distance scale 
for dwarfs and to assess the degree of contamination by non- 
dwarfs that arises as the lower limit on logg for entering 
the sample is varied. For low contamination the lower limit 
on log (J should increase with metallicity. We conclude that 
using only the DR8 gravities it is not possible to achieve a 
satisfying selection of subgiants. The level of contamination 
by dwarf stars becomes large once the upper limit on logg 
exceeds ~ 3.5. Since any dwarf that is misidentified as a 
subgiant has a seriously overestimated distance, studies of 
stellar kinematics that are based on DR8 gravities should 
rigorously exclude subgiants. 

We are currently applying the m ethod to recent dis- 
tances to stars in the RAVE survey (|Zwitter et al.l I2OI0I : 
I Burnett et al.ll201ll ). A wide variety of applications to this 
method will follow as it offers a standard tool to identify 
groups of stars with problematic parameters, to check the 
reliability of selection schemes and distance assignments and 
finally to correct for any biases in these distances, e.g. by de- 
viations in reddening with distance from the Sun. 

Our study has also illuminated the kinematic patterns 
that distance errors can generate. These are not limited to 
the production of spurious counter-rotating components, but 
include tilts of the velocity ellipsoids, and by allowing rota- 
tional velocity to masquerade as motion in either the radial 
or vertical direction, can extend to patterns of mean motion 
that, in a sample that is anisotropically distributed on the 
sky, can imply a wrong motion of the local standard of rest. 
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